北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2009, Vol. 32 ›› Issue (6): 125-129.doi: 10.13190/jbupt.200906.125.tangl

• 研究报告 • 上一篇    下一篇

基于POMDP强化学习的动态频谱分配算法

唐伦;陈前斌;曾孝平   

  1. (1.重庆大学 通信工程学院, 重庆 400044;
    2.重庆邮电大学 通信与信息工程学院, 重庆 400065)
  • 收稿日期:2008-10-05 修回日期:2009-08-31 出版日期:2009-12-28 发布日期:2009-12-28
  • 通讯作者: 唐伦

A Novel Dynamic Spectrum Allocation Algorithm Based on POMDP Reinforcement Learning

TANG Lun;CHEN Qian-bin;ZENG Xiao-ping   

  1. (1.College of Communication Engineering, Chongqing University, Chongqing 400044, China;
    2.School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China)
  • Received:2008-10-05 Revised:2009-08-31 Online:2009-12-28 Published:2009-12-28
  • Contact: TANG Lun

摘要:

提出基于VCG机制的动态频谱分配博弈模型,解决了认知无线网络环境存在的信息约束限制、分布式特性和频谱分配动态、复杂性问题;提出一种基于动态频谱分配的部分可观察马尔可夫决策过程(POMDP)强化学习算法. 认知用户通过对历史信息的观察、统计,为提高竞拍策略的奖赏值而进行不断的学习获取最优竞拍策略. 将POMDP强化学习转变为信度状态马尔可夫决策过程 (belief MDP)最优策略学习. 采用值迭代算法求解信度状态MDP模型的解. 仿真结果表明,基于POMDP强化学习算法可显著改善认知用户的行为,提高动态频谱分配性能.

关键词: 动态频谱分配, VCG机制, 认知网络

Abstract:

A game model based on Vickrey-Clarke-Groves (VCG) mechanism for dynamic spectrum allocation is presented, to solve the complexity problem of the dynamic spectrum allocation and reduce information exchange during the dynamic spectrum allocation. Further, a partially observable Markov decision processes (POMDP) reinforcement learning algorithm is presented. Through the observation and statistics of historical information, the secondary users enhance the reward value of bidding strategy by continuous learning, so as to obtain the optimal bidding strategy
. Finally, the POMDP reinforcement learning algorithm is transformed into optimal strategy learning algorithm of belief Markov decision processes(MDP), which is solved by using the value iteration algorithm. The simulation results reveales that the POMDP reinforcement learning algorithm can significantly improve the performance of dynamic spectrum allocation.

Key words: dynamic spectrum allocation, Vickrey-Clarke-Groves mechanism, cognitive network